Information-theoretic analysis of protein sequences shows that amino acids self-cluster.

نویسندگان

  • Yudong Cai
  • C T J Dodson
  • Andrew J Doig
  • Olaf Wolkenhauer
چکیده

We analyse for each of 20 amino acids X the statistics of spacings between consecutive occurrences of X within the well-characterized Saccharomyces cerevisiae genome. The occurrences of amino acids may exhibit near random, clustered or smoothed out behaviour, like one-dimensional stochastic processes along the protein chain. If amino acids are distributed randomly within a sequence, then they follow a Poisson process, and a histogram of the number of observations of each gap size would asymptotically follow a negative exponential distribution. The novelty of the present approach lies in the use of differential geometric methods to quantify information on sequencing of amino acids and groups of amino acids, via the sequences of intervals between their occurrences. The differential geometry arises from an information-theoretic distance function on the two-dimensional space of stochastic processes subordinate to gamma distributions-which latter include the random process as a special case. We find that maximum-likelihood estimates of parametric statistics show that all 20 amino acids tend to cluster, some substantially. In other words, the frequencies of short gap lengths tend to be higher and the variance of the gap lengths is greater than expected by chance. This may be because localizing amino acids with the same properties may favour secondary structure formation or transmembrane domains. Gap sizes of 1 or 2 are generally disfavoured, 1 strongly so. The only exceptions to this are Gln and Ser, as a result of poly(Gln) or poly(Ser) sequences. There are preferences for gaps of 4 and 7 that can be attributed to alpha -helices. In particular, a favoured gap of 7 for Leu is found in coiled coils. Our method contributes to the characterization of whole sequences by extracting and quantifying stable stochastic features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A graph-based clustering method applied to protein sequences

The number of amino acid sequences is increasing very rapidly in the protein databases like Swiss-Prot, Uniprot, PIR and others, but the structure of only some amino acid sequences are found in the Protein Data Bank. Thus, an important problem in genomics is automatically clustering homologous protein sequences when only sequence information is available. Here, we use graph theoretic techniques...

متن کامل

A generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences

The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...

متن کامل

Phylogenetic and sequence analysis of the growth hormone gene of two sturgeons, Huso huso and Acipenser Gueldenstaedtii

In this study, the cDNA Growth Hormone (cGH) of the Belugasturgeon (Husohuso) and Russian sturgeon (Acipensergueldenstaedtii) were cloned and sequenced, and phylogenetic relationships were examined using nucleic acid and amino acid sequences. The nucleotide sequence of the Beluga GH has an open reading frame of 645 nucleotides encoding a protein 214 amino acid residues. The signal peptide cleav...

متن کامل

An Evolutionary Relationship Between Stearoyl-CoA Desaturase (SCD) Protein Sequences Involved in Fatty Acid Metabolism

Background: Stearoyl-CoA desaturase (SCD) is a key enzyme that converts saturated fatty acids (SFAs) to monounsaturated fatty acids (MUFAs) in fat biosynthesis. Despite being crucial for interpreting SCDs’ roles across species, the evolutionary relationship of SCD proteins across species has yet to be elucidated. This study aims to present this evolutionary relationship based on amino aci...

متن کامل

Effect of Various Levels of Protein in Diet Based on Total and Digestible Amino Acids on Performance of Cobb Strain Broilers

In order to study the effects of various levels of protein in diet based on total and digestible amino acids on performance and carcass traits of broilers, an experiment was conducted as a completely randomized design with 288 broiler chicks from Cobb 500 strain in 6 treatments, each treatment consist of 3 replicates with 16 chicks per replicate. Treatments were the diets as follow: 1) diet for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of theoretical biology

دوره 218 4  شماره 

صفحات  -

تاریخ انتشار 2002